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Complex network approaches have recently been applied to continuous spatial dynamical systems, like 
climate, successfully uncovering the system's interaction structure. However the relationship between the 
underlying atmospheric or oceanic flow's dynamics and the estimated network measures have remained 
largely unclear. We bridge this crucial gap in a bottom-up approach and define a continuous analytical 
analogue of Pearson correlation networks for advection-diffusion dynamics on a background flow. 
Analysing complex networks of prototypical flows and from time series data of the equatorial Pacific, we 
find that our analytical model reproduces the most salient features of these networks and thus provides a 
general foundation of climate networks. The relationships we obtain between velocity field and network 
measures show that line-like structures of high betweenness mark transition zones in the flow rather than, as 
previously thought, the propagation of dynamical information. 

Complex networks allow to study underlying interaction structures of dynamical systems, where a detailed 
description of structure and dynamics may be impossible due to chaotic or otherwise complex beha- 
viour 1 " 3 . In recent years complex networks have also found fruitful application in climate science, where 
the reduction of an inaccessibly complex continuous system to a discrete complex network helps to find large- 
scale interaction structures, that can not be found with conventional methods. Climate networks have provided 
important insights regarding various questions in climate sciences, ranging from the impact of the El Nino 
Southern Oscillation on global climate 4 " 6 , to the dynamics of the Asian monsoons 7 ' 8 , ocean 910 and atmospheric 
dynamics 1112 . Usually, these spatially embedded climate networks are constructed from nodes corresponding to 
(geographical) locations, and links corresponding to statistical interdependence between climate time series 
observed at the locations of the node 4,813 " 15 . The strength of statistical dependence is often 9,61015 quantified using 
Pearson correlation 16 . Such statistical interdependences between climatic variables can occur due to a common 
driver (i.e. solar forcing) or be the result of a physical connection, which can be direct (i.e. ocean flows) or more 
indirect (i.e. teleconnections 17 ). In our model, we focus on direct local and causal connections, where statistical 
interdependences imply the existence of dynamical exchanges of energy and matter mediated by the underlying 
flow field. In the real-world application on the Pacific ocean we choose a region and time window, such that 
common drivers and indirect connections are unlikely to play a dominant role. 

While physical mechanisms were indicated in some studies 11 ' 7 ' 9 , the dynamical processes behind the obtained 
networks, and the relationship between the underlying flow's dynamics and the network measures, have 
remained largely unclear. Donges et al. 9 observed in the climate network of global surface air temperature a 
striking resemblance between line-like structures of high betweenness and the locations of major ocean currents. 
They hypothesized that this "backbone of the climate network" resulted from atmospheric- oceanic coupling and 
the fact that "surface ocean currents play a major role in the energy and information transfer in the climate 
system". In this letter, we propose a fundamental analytical model to study transport mechanisms in synoptic 
currents with which we test this hypothesis and bridge the gap in reasoning between the system's dynamics and 
network results. 

Instead of reconstructing the network from observations as in previous climate network studies, we compute 
the network directly and analytically from the underlying flow field and thus connect flow dynamics to the 
topology of networks. In spirit, this bottom-up approach from a model to the network, is similar to 1819 , but applies 
to very different systems. We provide a general framework that allows the inference of the topology of networks 
based only on the velocity field in the fluid. This way, the network approach can be used to compare the network 
imprint of the dynamics of the real-world climate system to that of a model system displaying an idealized flow 
with controlled dynamics. 
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In climate, advection and diffusion in atmospheric and oceanic 
flows are amongst the most important mechanisms by which 
dynamics are mediated. Temperature dispersion in such flows is 
governed by the advection diffusion equation (ADE). As an abstrac- 
tion of local random fluctuations in the temperature field on such a 
flow, we compare the decay of temperature 3 -peaks over time 
throughout the flow. We use them to define a cross -correlation ana- 
logue (CCA), given in equation (3), as the continuous normed scalar 
product between the temperature development of a tracer peak, 
evaluated at locations x Y and x 2 . We apply this method to derive 
networks i) analytically for a homogeneous and a circular flow, ii) 
from two more complex paradigmatic flows (using an appropriate 
approximation for the solution of the differential equation) and iii) 
from observed surface temperature data from the equatorial Pacific 
(using Pearson correlation) and investigate the relationships between 
the properties of the velocity field of the underlying flow and the 
observed network. 




Figure 1 | The correlation network (black) computed from the given 
velocity field (red arrows) for two flow fields for: (a) Counter-currents, (b) 
Crossing currents, for better visibility, a low link density of 2 percent was 
chosen. The networks display longer links in flow direction and a higher 
link density in regions with higher velocity. 



Results 

Application to paradigmatic flows. We constructed flow networks 
analytically for homogeneous and circular flows and found that the 
links are longer in flow direction than perpendicular to it (in the 
supplementary material (SM)). For more complex flows v(x) we 
have to use an approximate solution to the ADE, as a direct 
analytic derivation is not possible any more. We assume that these 
stationary flows vary slowly over space, |Vv(x)|<Cx ( we nave use d 
throughout this paper / = 1), and all derivatives of the velocity field 
are ignored in the following. This is necessary to ensure the 
applicability of the approximation. In d dimensions this gives us 
the approximated temperature field 

\*-*o-mt\ 2 

T a ppr(x,t]Xo)= d , (1) 

where % is the diffusivity of the fluid. To evaluate the validity of this 
assumption for a given velocity field v(x), we compute a diagnostic 
residual R from: 

R(x,t) = ^ - X AT appr + V- (v(x) TV) . (2) 

R(x,t) is zero for a perfect solution, which is the case if | Vv(3c) | = 0. If 
the maximum of this function is small compared to the other terms 

in equation 2, R(x,t)^ — , the approximation is considered to be 

good. For the velocity functions used in this paper this is indeed the 
case. The spatial integration for the norm is a simple Gaussian 
integration (SM). 

Then the correlation function takes the form (derivation in the 
Methods section): 

?jj (4 ^ f(t V ~ 4 ^ 

C[xuXl)= ^(icsCO-iogW) ' (3) 

where 

5=|x 1 -x 0 -v(x 1 )t| 2 +-^-|x 2 -x 0 -v(x 2 )(t + t / )| 2 
t + tl 

and t\ is defined in equation (9). We now compute the correlations in 
a grid and connect any pair of sites with a correlation larger than a 
with a "link". We determine the threshold a such that the link density 

p is constant, p = — ^ , where L net denotes the number of links in the 

Lfuii 

flow network and Lf uU denotes the number of links in the fully con- 
nected graph with the same nodes. We choose a value for p such that 



the network has almost no isolated nodes and is sufficiently far from 
being fully connected. We find this to be the case in a large range of 
values for p and, out of those, we choose p = 0.2. Our results are 
robust for a large range of link densities (see section S3 in the SM). 

We compute networks for the analytically homogeneous case 
(SM) and, using numerical integration, for two basic, low-gradient 
velocity fields given in Fig. 1, where i) one is composed of three 
narrow parallel flows, with alternating directions, and ii) the other 
flow is made up of two narrow flows intersecting in the middle. The 
resulting networks and underlying flows are illustrated in Fig. 1. 
Please note that the image resolution is equal to the grid resolution 
in all network figures. In areas of the flow with a higher velocity, the 
resulting networks show a higher density and length of links than in 
slower regions. We analyze these networks using the network mea- 
sures degree k { (equation (14)) and betweenness centrality b { (equa- 
tion (15)), in order to find relationships between them and the 
underlying velocity field. The network measures are given in 
Figs. 2 and 3. 

We mainly find that high absolute velocity coincides with high 
node degree. For low velocities, degree and flow speed are approxi- 
mately proportional, for higher speeds a saturation occurs due to the 
finite size of the grid (see Fig. S.2 in SM). High values for shortest path 
betweenness occur in the transition zones between in our case oppos- 
ing flow directions (Fig. 2), or regions of distinctly different flow 
velocities (Fig. 3). In both cases, the regions of highest betweenness 
outline the underlying velocity field. The position of the high 
betweenness zone depends on the value of the threshold (link den- 
sity), a lower threshold increases the size of the well-connected region 
and pushes the transition zone further out. 

Other network measures such as local clustering coefficient or 
local assortativity 3 yield structures similar to that of the node degree 
(results not shown). 

Application to ocean data. In the next step, we compute correlation 
networks from sea surface temperatures (SST) in the tropical Pacific 
and compare them with measured ocean currents velocity field (see 
data description in Methods). Flow velocity, gradient, and the 
obtained network measures are given in Fig. 4. To suppress 
turbulent effects, we use only the longitudinal component of the 
gradient. As for the paradigmatic flows we investigated earlier, also 
here we find a reasonable agreement between the absolute values of 
the velocity field and the degree in the correlation network (Fig. 4 (a) 
and (b), also Fig. S.3 in SM). Again, the degree is maximal where the 
current's velocity is, and the betweenness shows large values in 
regions with large values of the longitudinal velocity field's 
absolute gradient (Fig. 4 (c) and (d)), hereby confirming the results 
obtained for the paradigmatic flows. 
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Figure 2 | Flow field and network measures for the counter-currents in Fig. (la), (a) The normed degree, relates to (b) the absolute value of the 
flow's local velocity; (c) The maxima of the normed betweenness are co-located with (d) the maxima of the absolute value of the gradient gradient of the 
absolute current velocity. See equations (14) and (15) for definitions of the network measures. 



Discussion 

In this paper we have established a connection between data net- 
works and the underlying physical system. The approach can 
easily be generalized beyond 2D static flows and to flow systems 
outside of climate science, as temperature can be replaced by any 
quantity described by the heat equation such as density or chem- 
ical concentrations. In multivariate settings, reaction, advection 
and diffusion processes could be studied simultaneously. Given 
sufficient computing resources, non-stationary flows v(x,t) could 
be treated similarly, using a time offset for integration range and 
peak appearance, as the ADE can still be solved analytically for 
time dependent velocity fields. This could give new insights in the 
dynamics of evolving flows, highly valuable not only in the ana- 
lysis of changing climates. 



The line-like structures in the betweenness fields of global climate 
networks 9 were previously attributed to "information flow" in 
underlying ocean currents. We found that regions of high between- 
ness outline the flow rather than tracing it. Our results therefore 
suggest some corrections concerning the former interpretation and 
suggest that a high betweenness occurs in transition zones between 
regions of different magnitude or direction of the underlying velo- 
city. This qualitative observation can be seen when comparing the 
betweennes with the absolute gradient. Physically, this could be due 
to the fact that advection dominates in fast flowing regions, which 
results in a higher parallel but lower perpendicular link density com- 
pared to the stagnant case. At the same time, we observe a correlation 
between regions with a high node degree and high average current 
velocity. Considering the advective- diffusive nature of these surface 
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Figure 3 | Flows and network measures for the crossing currents, see captio of Fig. 2. 
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Figure 4 | Network measures of the correlation network of the equatorial counter-currents from 1997 daily anomaly SST data in comparison 
with flow velocity and gradient. The region of highest degrees coincides with the region of highest flow velocity, while the regions of highest betweenness 
coincide with the highest velocity gradient. 



currents, a physical explanation could be that a fast flow transports 
the signal farther. 

We find that both, the degree and the betweenness increase mar- 
ginally along the flow direction. This can be understood as the signals 
from the slow flowing region first travel through diffusion, once they 
hit the fast region their main peak will travel downstream (the tra- 
jectory approximately follows the red arrow in Fig. 5). This leads to 
points downstream in the fast flowing area to have connections even 
to points in the slow region upstream from them, leading to increased 
degree and betweenness there. 

In future research, such idealized case studies may be highly useful 
to study the influence of spatial embedding, and to test hypotheses 




Advection-dominated region 
Mixed region 

Diffusion-dominated region 



Figure 5 | Schematic illustration of flow properties, that result in 
distinctive network properties: While advection dominates the transport 
of temperatire fluctuations in regions of fast propagation, localized 
diffusion dominates in stagnant regions. Signals that leave the stagnant 
area by diffusion through the mixed region are subsequently transmitted 
along the flow. This leads to the asymmetry seen in the betweenness, where 
the betweenness values rise in flow direction. 



concerning the dynamics of observed correlation networks. Given 
sufficiently low-gradient flow data, this method can be used to con- 
struct correlation networks from observed oceanic or atmospheric 
flows. 

We have shown how correlation networks can be constructed 
directly from flow fields and given an example of how to use these 
networks to interpret network measures. We thereby provide a 
foundation for climate network analysis and bridge the gap between 
the dynamics of underlying flows and climate network interpretation. 

Methods 

Definition of continuous cross-correlation analogue. For the model system we 
assume stationary two-dimensional flows in a square area in a two-dimensional 
boundaryless fluid of constant diffusivity x described by the velocity field v(x). The 
ADE states how the change of temperature over time is governed by the spatial 
temperature change and the velocity: 



cT 
dt~' 



■ X AT-Viv(x)T), 



and is obtained by inserting the advective and diffusive flux 

]=]diff + ~jadv = ~ X VT + vT 



into the sourceless continuity equation for temperature 

dt~~ 



-v-j. 



(4) 



(5) 



(6) 



Here, T(x,t) is the value of the temperature in position x at time t. We use a 8 -peak as 
a tracer of the flow, analogous to local temperature fluctuations. It is inserted at an 
arbitrary point x 0 in the fluid as the initial condition, so, in other words, we solve the 
Cauchy problem of equation (4) with the initial condition 



T(x,0]x 0 ) = d(x-x 0 ). 



(7) 



Analogous to the commonly used Pearson correlation 16 , we define the continuous 
cross-correlation analogue (CCA) as the normed scalar product of solutions of the 
Cauchy problem of the ADE at two points x i and x 2 



C{xi,x 2 )-- 



I T(xi,t;x 0 ) T(x 2 ,t + ti;xo) 

\||r(3ci,tpco)iri|r(x2,f+ft;3e 0 )| 



(8) 



The time lag t\ is the difference in travel time of the peak from x 0 to X\ and x 2 , the norm 
is defined in the SM in equation (S.l) 



U — tmax (x 1 5*0 ) — tmax (*2 ) , 



(9) 



where t max (x,x 0 ) is the time when the temperature at x reaches its maximum, with the 
initial peak starting at x 0 . The scalar product is then defined as the integral over time 
and peak position x 0 . The time integration is analogous to the sum over time steps, the 
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integration over the peak position is an integral over realizations of the peak, 
corresponding to stochastics in the time series, where peaks appear at random in 
arbitrary places. So we define the CCA in this context as: 



C(x u x 2 )-- 



ft 0 1 r 2 r (*i ' t'Jo )T(x 2 ,t + ti ;x 0 ) dx 0 dt 
^\\T(x u t;x Q )\\^/\\T(x 2 ,t + 



(10) 



where x 0 is the position of the peak. The lower limit of the integration, t 0 is chosen 
small but non-zero (here t 0 < 10" 2 ) as the correlation function is not defined for t = 0. 
The upper limit is chosen such that all temperature profiles have decayed to a value 
very close to zero (here: t x = 5000). 

Network construction. We evaluate the CCA on a regular grid between all pairs of 
grid-points. This provides the correlation matrix Cy from which the adjacency matrix 
A is constructed by choosing a fixed significance threshold a (see "results" section on 
page three). This can be expressed with the Heaviside 6 function and Kronecker S as 



Ay = 0(Cy-a)-^ 



(11) 



For any given flow field, we first have to solve the ADE (equation (4)) and use the 
result to compute the correlation matrix using equation (10). The Cauchy problem of 
the ADE in d dimensions can be solved as 



T(x,t]x 0 ) = 



-+T 0 



(12) 



in the homogeneous case with the velocity field v(x) = v 20 . This solution takes its 
maximum value at 



tmax( x -> x o) z 



^X 2 + \v\ 2 \x-Xo\ 2 



(13) 



Network measures. To analyze the networks, we used the basic network measures 
degree and betweenness 2 , as normalized measures to account for grid size effects: 

The degree k { of node i of a network with N nodes is given by the number of links 
attached to it, 



N-l 



(14) 



and the shortest path betweenness b t of a node i is defined as the number of all 
shortest paths that go through it, 



(N-l)(N-2) 



Where fij k is the number of shortest paths connecting k and j and fij k {i) is the number 
of those paths, that go through i. 

Data. The daily anomaly SST data is based on the optimum interpolation data (OI.v2) 
as provided by NOAA/NCDC 21 ' 22 and the averaged monthly current's velocity data 
was provided by the OSCAR Project Office (Earth and Space Research, Seattle). We 
used data from the region 120°-160°W, 15°S-15°N and for the time period August 
1996 to August 1997. The chosen year is neither an El Nino nor a La Nina-year, and 
the results we present in the following are largely robust against the choice of the 
particular year. The network is calculated by standard cross-correlation and allowing 
for a lag of up to one day. 
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